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Abstract. We introduce estimation and test procedures through divergence minimization for 
models satisfying hnear constraints with unknown parameter. Several statistical examples and 
motivations are given. These procedures extend the empirical likelihood (EL) method and share 
common features with generalized empirical likelihood (GEL). We treat the problems of exis- 
tence and characterization of the divergence projections of probability measures on sets of signed 
finite measures. Our approach allows for a study of the estimates under misspecification. The 
asymptotic behavior of the proposed estimates are studied using the dual representation of the 
divergences and the explicit forms of the divergence projections. We discuss the problem of the 
choice of the divergence under various respects. Also we handle efficiency and robustness prop- 
erties of minimum divergence estimates. A simulation study shows that the Hellinger divergence 
enjoys good efficiency and robustness properties. 
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1. Introduction and notation 

A model satisfying partly specified linear parametric constraints is a family of distributions Ai^ 
all defined on a same measurable space {X,B), such that, for all Q in M^, the following condition 
holds 

g{x,9) dQ{x) = 0. 



The unspecified parameter 9 belongs to 0, an open set in W^. The function g :— {gi, ...,gi)'^ is 
defined on A" x 8 with values in R', each of the gi^s being real valued and the functions gi, . . . , gi,lx 
are assumed linearly independent. So Ai^ is defined through Hinear constraints indexed by some 
d— dimensional parameter 9. Denote the collection of all probability measures on {X,B), and 

Ml:= I^Qe Afi such that J g{x, 9) dQ{x) = o| 

so that 

(1.1) M'=\jMl. 

eee 

Assume now that we have at hand a sample Xi, ...,X„ of independent random variables (r.v.'s) 
with common unknown distribution Pq. When Pg belongs to the model (|l.ip . we denote the 
value of the parameter 9 such that Meo contains Pq. Obviously, we assume that 9o is unique. 

The scope of this paper is to propose new answers for the classical following problems 
Problem 1: Does Pq belong to the model Ai^ 1 
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Problem 2: When Pq is in the model, which is the value 9o of the parameter for which 
/ g{x, 9o) dPo{x) — 7 Also can we perform simple and composite tests for 6o ? Can we construct 
confidence areas for Oq ? Can we give more efficient estimates for the distribution function than 
the usual empirical cumulative distribution function (c.d.f.) ? 

We present some examples and motivations for the model (jl.ip and Problems 1 and 2. 
1.1. Statistical examples and motivations. 

Example 1.1. Suppose that Pq is the distribution of a pair of rand om variables (X, Y ) on a 
product space X x y with known marginal distributions Pi and P2- Bickel et al. il99A) study 



efficient estimation of = Jh{x,y) dPo{x,y) for specified function h. This problem can be 
handled in the present context when the spaces X and y are discrete and finite. Denote X = 
{xi,...^Xk\ and y = {j/i, . . . , y^}- Consider an i.i.d. bivariate sample {Xi,Yi),l < i < n of 
the bivariate random variable {X,Y). The space M.g in this case is the set of all p.m.'s Q on 
Xxy satisfying J g{x,y, 9) dQ{x,y) = where g = {g^-^\ . . . . . .,g''r\gif, g^^\x,y,9) = 

'^{xi}xy{x,y) - Pi{x,), g) '{x,y,9) = lxx{yj}{x,y) - P2{yj) for all (i, j)e{l,. . . , k} x {1, . . . ,r} , 
and gi{x, y, 9) — h{x, y) — 9. Problem 1 turns to be the test for "Pq belongs to [Jg^Q M.e " , while 
Problem 2 pertains to the es timation and tests f or specific values of 9. Motivation and references 
for this problem are given in \Bickel et al.1 11991) . 

Example 1.2. (Generalized linear models). Let Y be a random variable and X a l-dimensional 
random vector. Y and X are linked through 

Y:^m{X, 9o)+e 

in which m(., .) is some specified real valued function and 9q, the parameter of interest, belongs to 
some open set Q C M"^. e is a measurement error. Denote Pq the law of the vector variable {X,Y) 
and suppose that the true value 9q satisfies the orthogonality condition 

x{y - m{x, 9o)) dPo{x, y) = 0. 

Consider an i.i.d. sample {Xi^Yi), 1 < i < n of r.v.'s with same distribution as {X,Y). The 
existence of some 9q for which the above condition holds is given as the solution of Problem 1, 
while Problem 2 aims to provide its explicit value: here Ail is the set of all p.m.'s Q on 
satisfying J g{x, y, 9) dQ{x, y) — with g{x, y, 9) — x{y — m{x, 9)). 



Qin and Lawles^ (1994) introduce various interesting examples when (jl.ip applies. In their ex- 



ample 1, they consider the existence and estimation of the expectation 9 of some r.v. X when 
E(X'^) — m{9) for some known function ra{.). Another example is when a bivariate sample 
{Xj^Yj) of i .i.d. r .v.'s is observed, the expectation of Xi is known and we intend to estimate E{Yi). 
iHabermanl ( 1984 ) and[Sheehx ( 1987, ) consider estimation of F{x) based on i.i.d. sample Xi, . . . , X„ 



with distribution function F when it is known that / T{x) dF{x) — a, for some specified function 
T(-). For this problem, the function g{x,9) in the model (|l.ip is equal to T{x) — 9 where 9 = a is 
known. This examp le with a unknown is treated in details in Section 3 of the present paper. We 
refer to lOwenI (|200lh for more statistical examples when model ()l.ip applies. 



Another motivation for our work stems from confidence re gion (C.R.) e stimation techniques. The 
empirical likelihood method provides such estimation (see Owen ( 1990l )). We will extend this ap- 



proach providing a wide range of such C.R.'s, each one depending upon a specific criterion, one of 
those leading to Owen's C.R. 
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An im portant estimator of 9o is the generalized method of moments ( GM M) estimator of HansenI 
(|l982l ). The empirical likelihood approach develop e d bvlOwenI (Il988h andlOwenI (|l990( ) has been 
adapted in the present setting bv lQin and LawlessI ( 1994 ) and llmbena (|l997l ) introducing the em- 
pirical likelih ood estimator (EL ) . The recent literature in econometrics focusses on such models, 
the paper by iNewev and SmithI (|2004l ) provides an exhaustive list of works dealing with the sta- 
tistical properties of GMM and generalized empirical likelihood (GEL) estimators. 



Our interest also lays in the behavior of the estimates under misspecification. In the context of 
tests of hypothesis, the statistics to be considered is some estimate of some divergence between 
the unknown distributions of the data and the model. We are also motivated by the behavior 
of those statistics under misspecification, i.e., when the model is not appropriated to the data. 
Such questi ons have no t been addressed until now for those problems in the general context of 
divergences. ISchennachI (|2007l ) consider the asymptotic properties of the empirical likelihood esti- 
mate under misspecification. As a by product, we will prove that our proposal leads to consistent 
test procedures; furthermore, the asymptotic behavior of the statistics, under ? ii , pr ovides the 
fundamental tool in order to achieve Bahadur efficiency calculations (see iNikitinl ( 1995h ). 



An important result due to iNewev and SmithI (|2004[ ) states that EL estimate enjoys optimality 
properties in te rm o f efficiency when bias corrected among all GEL and GMM estimators. Also 
Corcoran ( 19981 ) and Baggerlv (|l998h proved that in a class of minimum discrepancy statistics, EL 
ratio is the only that is Bartlett correctable. However, these results do not consider the optimality 
properties of the tests for Problems 1 and 2. Also, in connection with estimation problem, they do 
not consider the properties of EL estimate with respect to robustness. So, the question regarding 
divergence-based methods remains open at least in these two instances. 



The approach which we develop is based on minimum descrepancy estimates, which have common 
features with minimum distance techniques, using merely divergences. We present wide sets of 
estimates, simple and composite tests and confidence regions for the parameter 9o as well as various 
test statistics for Problem 1, all depending on the choice of the divergence. Simulations show that 
the approach based on Hellinger divergence enjoys good robustness and efficiency properties when 
handling Problem 2. As presented in Section 5, empirical likelihood methods appear to be a special 
case of the present approach. 



1.2. Minimum divergence estimates. We first set some general definition and notation. Let P 
be some probability measure (p.m.). Denote M^(P) the subset of all p.m.'s which are absolutely 
continuous (a.c.) with respect to P. Denote M the space of all signed finite measures on {X, B) 
and M{P) the subset of all signed finite measures a.c. w.r.t. P. Let </? be a convex function from 
[—oo, +oo] onto [0, +oo] with ip{l) = 0. For any signed finite measure Q in M{P), the (/>— divergence 
between Q and the p.m. P is defined through 

(1.2) 0(Q,P):- /^(^) dP. 



When Q is not a.c. w.r.t. P, we set 4>{Q,P) — +oo. This definition ex tends Riischendorl ( 1984l )'s 
one which applies for divergences between p.m.'s; it also differs from lCsiszaij (|l963l )'s one, which 
requires a common dominating ct— finite measure, noted A, for Q and P. Since we will consider 
subsets of M^{P) and subsets of M{P), it is more adequate for our sake to use the definition (|1.2p . 
Also note that all the just mentioned definitions of 0— divergences coincide on the set of all p.m.'s 
a.c. w.r.t. P and dominated by A. 
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For all p.m. P, the mappings Q E M ^ 4'{Q,P) are convex and take nonnegative values. When 
Q = P then (f>{Q, P) = 0. Fm-ther, if the function x <f{x) is strictly convex on neighborhood of 
x — 1, then the following basic property holds 

(1.3) <f){Q, P) = if and only if Q = P. 

All these properties are presented in ICsiszail ( 1963I ). ICsiszai ( 1967 ) and Liese and Vaidal ( 1987 ) 



Chapter 1, for (/)— divergences defined on the set of all p.m.'s M^. When the (/^-divergences are 
defined on M, then the same arguments as developed on hold. 

When defined on M^, the KuUback-Leibler (KL), modified KuUback-Leibler (KLm), X^i modified 
(Xm), Hellinger (H), and divergences are respectively associated to the convex functions 
(p{x) = a:loga; — a; + 1, 1^9(2;) = — loga; + x — 1, 1^9(2;) — ^{x — 1) , (p{x) ~ ^{x — 1) /x, (p{x) — 

2{y/x — 1)^ and 1^9(2;) = \x — 1|. All those d ivergences except th e one, belo ng to the class of 
power divergences introduced in lCressie and Read (198^ (see also Liese and Vajda (1987.) Chapter 
2). They are defined through the class of convex functions 

(1.4) xeR*,^ ^,{x) ^''-r^+^^^ 

7(7-1) 

if 7 e K \ {0, 1} and by (po{x) := — \ogx + a; — 1 and ipi{x) := xlogx — x + 1. For all 7 € M, 
(/9^(0) := lima;|o 'P-yix) and ipj{+co) :— limj;|+oo V-yix). So, the fsTL— divergence is associated to ipi, 
the KL^ to (fo, the to (^2, the Xm to ^P-i and the Helhnger distance to <pi/2- For all 7 S K, we 
sometimes denote 0^ the divergence associated to the convex function ip^ . We define the derivative 
of (fij at by ip'^{0) := lima;j^o We extend the definition of the power divergences functions 

Q E — > 4>-i{Qi P) onto the whole set of signed finite measures M as follows. When the function 
X — > if-y^x) is not defined on (— 00, 0[ or when ip.^ is defined on R but is not a convex function we 
extend the definition of ip-y through 

(1.5) X e [-00, +00] i-^ 'P-f{x)t[o,+oc]{x) + {'fi'^{0)x + </'7(0))l[-oo,o[(a;)- 
For any convex function tp, define the domain of ip through 

(1.6) Dip = {x E [—00, +00] such that p{x) < +00}. 

Since ip is convex, is an interval which may be open or not, bounded or unbounded. Hence, 
write Dip :— (a, b) in which a and b may be finite or infinite. In this paper, we will only consider p 
functions defined on [— cxd, +00] with values in [0, +00] such that a < 1 < 6, and which satisfy p{l) = 
0, are strictly convex and areC^ on the interior of its domain D^p; we define p{a), p'{a), ip"(a), ip{b), 
ip'{b) and p"ib) respectively by p{a) := limxia^ix), tp'{a) \\iaxia<p' {x), (p"{a) := liuix: la ^" (x), 
ipib) :~ limxib ^{x), 'p'(b) := lima^fh (/5'(a;) and (p"{b) := lima;|f, (^"(x). These quantities may be 
finite or infinite. All the functions cpj (see (|1.5p ) satisfy these conditions. 

Definition 1.1. Let ^l be some subset in M. The (j)— divergence between the set fl and a p.m. P, 

noted P), is 

cj,{n,P) ■.= MJ{Q,P). 

Definition 1.2. Assume that (j){^,P) is finite. A measure Q* £ CI such that 

HQ*,P) <<f>iQ,P) for all Qefl 
is called a (j)— projection of P onto fl. This projection may not exist, or may be not defined uniquely. 

We will make use of the concept of 0— divergences in order to perform estimation and tests for the 
model (jl.ip . So, let Xi, Ar„ denote an i.i.d. sample of r.v.'s with common distribution Pq. Let 
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Pn be the empirical measure pertaining to this sample, namely 



n 

n. ' 



Pn ■= - y OXi 

n ^-^ 

i=i 

in which Sx is the Dirac measure at point x. When Pq and all Q G tVI^ share the same discrete 
finite support S, then the 0-divergence (t>{Q, Pq) can be written as 

(1.7) ,(o.p„,.|:(||)p„,,,. 

In this case, 0(Q, Pq) can be estimated simply through the plug-in of in (|1.7p . as follows 

(1-8) ^(g,Po):=E^(;|^)^»(j)- 

In the same way, for any in B, (A^^, Pq) is estimated by 

(1-9) ^{Ml,Po):= inf^^^f^)p„W, 

and (tW^, Po) = hif^ee (-^e: ^o) can be estimated by 

(1.10) ?(A,.,P„):=,„t^„^_|: 

By uniqueness of infg^Q (p (^M^, Pq) and since this infimum is reached a.t 9 = Oq, we estimate 
through 

(1.11) i?,:=„gi„t_^i„^|:,(m)p,.0,. 

The infimum in (|1.9p (i.e., the projection of P„ on Alg) may be achieved on the frontier of Ail- 
In this case the Lagrange method is not valid. Hence, we endow our statistical approach in the 
global context of signed finite measures with total mass 1 satisfying the linear constraints. 



(1.12) 



Me := |(3 e M such that J dQ = 1 and J g{x, 0) dQ{x) = o| 



and 

(1.13) M:=[jMe, 

eee 

sets of signed finite measures that replace and Ai^. 

As above, we estimate (/)(7We,Po), 0(7W,Po) and 9o respectively by 



(1.14) $(^o,Po):=MX^(§3^ 



Pnij), 



(1.15) 

and 

(1.16) 

Enhancing Ai^ to M is motivated by the following arguments 
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- For all 6 in 0, denote Q\ and Q* respectively the projection of P„ on M.\ and on A^e, as 
defined in (|1.9|) and in ()1.14p . If Q\ is an interior point of A^g, then, by Proposition 12.51 
below, it coincides with Q*, the projection of P„ on A^e, i.e., Q\ = Q* . Therefore, in this 
case, both approaches coincide. 

- It may occur that for some 9 m. Q, Ql, the projection of P„ on Ml, is a frontier point of 
Alg, which makes a real difficulty for the estimation procedure. We will prove in Theorem 
13.41 that 00, defined in ()1.16p and which replaces (jl.lip . converges to ^o- This validates 
the substitution of the sets AlJ by the sets Ade. In the context of a test problem, we will 
prove that the asymptotic distributions of the test statistics pertaining to Problem 1 and 
2 are unaffected by this change. 

This modification motivates the above extensions in the definitions of the ip functions on oo, +oo] 
and of the (/)-divergences on the whole space of finite signed measures M. 



In the case when Q and Pq share different discrete finite support or share same or different discrete 
infinite or continuous support, then formula (|1.8p is not defined, due to lack of absolute continuity 
of Q with respect to P„. Indeed 

(1.17) ^(Q,Po) :-0(Q,P„) -+00. 
The plug-in estimate of (l){A4g, Pq) is 

(1.18) 0(A^e,Po):= inf HQ,Pn)= inf Ivi^i^)] dPn{x). 

If the infimum exists, then it is clear that it is reached at a signed finite measure (or probability 
measure) which is a.c. w.r.t. P„. So, define the sets 

(1.19) M'-g'^ := I Q e M such that Q < P„, ^ Q{X^) = 1 and ^ Q{X,)g{X^, 0) = o\ , 

I i=l 1=1 J 

which may be seen as subsets of R". Then, the plug-in estimate (|1.18p of (p{Aig, Pq) can be written 

as 

1 " 

(1.20) <P{Mg,Po)^ inf -V (^(nQ(X,)). 

In the same way, 4>{M, Pq) := infgge infgeAie 4>{Qj Po) can be estimated by 

(1.21) ?(M, Po) = inf inf -j^V {nQ{Xi)) . 

By uniqueness of inig^Q (j){Aig, Pq) and since this infimum is reached at — 0q, we estimate 
through 

1 " 

(1.22) 00 = arginf inf -S^^ {nQ{X^)) . 

Note that, when Pq and all Q ^ share the same discrete finite support, then the esti- 

mates ([L22l) . ([L2T|) and (fr20l) coincide respectively with (fTTBll . (fTTSl) and (fLMl) . Hence, in 
the sequel, we study the estimates (j){Mg,Po), 0(A1,Po) and 9^ as defined in p.20p . (|1.2ip and 
p.22p . respectively. We propose to call the estimates 9^ defined in (|1.22p "Minimum Empirical 
(/(-Divergen ces Estimate s" ( MEt^DE ' s). A s will be noticed later on, the empirical likelihood par- 
adigm fsee lOwenI (|l988h and lOwenI (|l990[ )l. which is based on this plug-in approach, enters as a 



special case of the statistical issues related to estimation and tests based on (/)— divergences with 
(p{x) — ipoix) — — log a; + a; — 1, namely on Xim— divergence. The empirical log- likelihood ratio 
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for the model ()1.12|) . in the context of ^-divergences, can be written as --nKL„i{M. a, Pn)- In the 
case of a single functional, for example when g{x, 9) = x — 9 with x and 6 belong to R. I Owen ( 1988h 
shows that 2nKLjn{M0, Pq) has an asymptotic X(i) distribution when Pg belongs to Mg. (see 
lOwenI ( 19881 ) Theorem 1). This result is a nonparametric version of Wilks's theorem fsee IWilki 
(|l938l )). In the multivariate case, the same result holds (se e Owe n (199 0i) Theorem 1). When 
we want to extend the arguments used in lOwenI ( 198^ and iOwenI |l99Ci l in order to study the 
limiting behavior of the statistics (/)(A^e,Po), when Pq ^ M.g (for example, when 0o 7^ S), most 
limiting arguments become untractable. We propose to use the so-called "dual representation of 
(h— divergences" (see lKezioul (|2003f )). a device which is well known for the KuUback-Leibler diver- 
gence in th e context of large deviations, and which has been used in parametric statistics in'Kezioul 
(j200j) and Broniatowsk i and Kcziou (2009). The estimates then turn to be M-estimates whose 
limiting distributions are obtained through classical methods. On the other hand, the obtention 
of the limit distributions of the statistics (/)(A^e7-Po) when Pq ^ M.e, requires the study of the 
existence and the characterization of the projection of the p.m. Pg on the sets M.e- 



This paper is organized as follows : In Section 3, we study the asymptotic behavior of the proposed 
estimates ()1.20p . (|1.21|) and ()1.22p giving solutions to Problem 2. We then address Problem 1, 
namely : doe s there exist som e On in 8 for which Pq belongs to Mgg7 In Section 4, extending 
the result by Qin and Lawless! ( 1994 ). we give new estimates for the distribution function using 
the (/)-projections of P„ on the model A4. We show that the new estimates of the distribution 
function are generally more efficient than the empirical cumulative distribution function. Section 
5 illustrates the concept of empirical likelihood in the context of ^-divergences techniques. In 
Section 6, we focus on robustness and efficiency of the ME0D estimates. A simulation study aims 
at emphasizing the specific advantage of the choice of the Hellinger divergence in relation with 
robustness and efficiency considerations. All proofs are in Section 7. 



2. Estimation for Models satisfying Linear 
Constraints 

At this point, we must introduce some notational convention for sake of brevity and clearness. 
For any p.m. P on X and any measurable real function / on X, Pf denotes / f{x) dP{x). For 
example, PQgj{9) will be used instead of J gj{9, x) dPo{x). Hence, we are led to define the following 
functions : denote 5 the function defined on A" x O with values in R'+^ by 

{x,9) ^ g{x,9) := {lx{x),gi{x,9),...,gi{x,9)f, 
and for all 6* G O, denote also 'g{6), g{6), gj{0) the functions defined respectively by 

g{9) : X -> M'+i 

X ^ g{x,9) -.^ {go{x,9),gi{x,9), . . . ,gi{x,9))'^ , where go{x,9) -.^ lx{x), 

g{9) : X ~> 

X i-> g{x,9) := {gi{x,9),. . . ,gi{x,9))'^ 

and 

gj{9) : X ^ R 

X ^ gj{x,9), for aU j e {0,1,...,;}. 



We now turn back to the setting defined in the Introduction and consider model (|1.12|) . For fixed 
6* in 8, define the class of functions 



^Fe ■.= {gn{0),g^{0),...,gi{9)}, 
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and consider the set of finite signed measures Me defined by + 1) linear constraints as defined 
in (fLT2ll 

Me :== |q e Mjr^ such that j dQ{x) = 1 and j g{x, 0) dQ{x) = o| . 

We present expUcit tractable conditions for the estimates (|1.20p . (|1.2ip and (|1.22p to be well 
defined. This will be done in Propositions [2TT1 Remark l2.1i Proposition 12 . 21 and Remark |2 . 3 1 below . 
First, we present sufficient conditions which assess the existence of the infimum in ()1.20p . noted 
Qg, the projection of P„ on Me- We also provide conditions under which the Lagrange method 
can be used to characterize Q*g. The Fenchel-Legendre transform of will be denoted i.e., 

(2.1) t^M.^ip*{t)-~s\ip{tx-ip{x)}. 
Define 

{QeM such that Q <$: Pn and -"S^ (fi (nQiXi)) < oo > , 
I J 
i.e., the domain of the function 

n 

(Q(Xi), . . . , QiXn)f eR-^^-Y^ip {nQ{X^)) . 

1=1 

We have 

Proposition 2.1. Assume that there exists some measure R in the interior ofD^^^ and in M')^^ 
such that for all Q in dv'"^^ , the frontier of 2?^"'' , we have 

n 1 ^ 

(2.3) - V^(ni?(X,)) < - V^(nQ(X,)). 

n. ^ — ^ n. ^ — ^ 



(2.2) 



n ^ n 



Then the following holds 

(i) there exists an unique Qg in A^^"'' such that 

(2.4) inf (nQ(XO) = - ^ ^ ("Q^C^O 

(ii) Qg is an interior point of I?^"'' and satisfies for all i = 1, . . . , n 



(2.5) 



(2.6) 



where (cq, Ci, . . . ,Ci)^ := ce is solution of the system of equations 

J (p' (co + J2\=iCigi{x,9)^ dPn{x) = 1 
J gj{x,0)ip' (dQ + J2\=i'ct9t{x,0)j dPn{x) = 0, j = 

Example 2.1. For the — divergence, we have 'D^2 — K". Hence condition h2. ^)] holds whenever 

(n) 

Mg is not void. Therefore, the above Proposition holds always independently upon the distribution 
Pa. More generally, the above Proposition holds for any (j)- divergence which is associated to <p 
function satisfying — K. (See ll.6\} for the definition of D^). 
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Example 2.2. In the case of the modified Kullback-Leihler divergence, which turns to coincide 
with the empirical likelihood technique (see Section 5), we have I?^]^ — (]0,+oo[)". For a in Q, 
define the assertion 

,^ . there exists q = (gi, g„) in R" with < < 1 for all i = 1, n 

and Y.7=i1i9j{^i>(^) = for all j^l,...,l. 

A sufficient condition, in order to assess that condition i2.3\) in the above Proposition holds, is 
when {"27% ho lds for a = 6. In the case when g{x, 6) — x — 9 , this is precisely what is checked in 
Owen ~ ri99d )). p. 100, when is an interior point of the convex hull of {Xi, ...,X„). 



Example 2.3. For the modified 'x^ — divergence, wehaveT)^2 = (]0jOo[)"; and therefore, condition 



^2. 7p for a = 9 is sufficient for the condition h2.S\) to holds. So, conditions which assess the exis- 
tence of the projection Qg are the same for the modified — divergence and the KL^- divergence. 

(n) 

Remark 2.1. // there exists some Qo G A^e such that 

(2.8) a < inf nQo(^i) < supnQo{X,) < b, 

i i 

then applying Corollary 2.6 in Borwein and Lewit 1 199ii ). we get 



inf i V</j(ng(X,)) = sup (to- f^{t^g{x,9)) dPn{x) 

with dual attainement. Furthermore, if 

^'{a) < McJg{X,,9) < snpdj g{X,,0) < ^'{b), 

« i 

with ce a dual optimal, then the unique projection Qg of Pn on A^g"'' is given by i2. 5\) 



We will make use of the dual representation of ^-divergences (see iBroniatowski and Kezioul pOOGT I 
theorem 4.4). So, define 

(2.9) Ce := {t e M'+^ such that t'^g{.,9) belongs to Im (p' {Pq ~ a.s.)} , 
and 

(2.10) := {t e M'+i such t'^giXi, 9) belongs to Im ip' for all i=l,...,n) . 

(n) 

We omit the subscript 9 when unnecessary. Note that both Cg and Cg depend upon the function 
Lp but, for simplicity, we omit the subscript Lp. 

If Po admits a projection Qt on M.0 wi th the same support as Po, using the second part in Corollary 



3.5 in IBroniatowski and Kezioul ()2006l ). there exist constants cq, . . . , Q, obviously depending on 9, 



such that 

I 



' = co + ^^Cjgj{x,9), for aU x[Po-a.s. 

Since Q*g belongs to M.0, the real numbers Cq, Ci, . . . , c; are solutions of 



(2.11) 



! ^' ^ (co + Ej=iCjffi(a;,6')^ dPo{x) = 1 
^ gj{x,9)p'^^ {co+Y.\^^Cjgj{x,9)^ dPo{x) 0, j = l,...,Z. 
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Since Q I—!- (j){Q Po) is strictly convex, the proie ction Q*g of Pq on the convex set Me is unique. 
This imphes, by iBroniatowski and Kezioul ( 20061 ) Corollary part 1, that the solution 

ce ■■= (co, ci, . . . , q)^ 

of the system ()2.11|) is unique provided that the fu nctions Qi(d) are linearly indepe ndent. Further, 
using the dual representation of ^-divergences (see Broniatowski and Kezioul ( 20061 ) Theorem 4.4), 
we get 

cl){Mg,Po) :=0(Q;,Po) = sup f dQ; - J ^*{f ) dPo 

and the sup is unique and is reached at / = (p' {dQg / dPo) = cq + X]j=i ^j9j{-i it belongs to 
!F. This motivates the choice of the class through 

T := {x ^ t'^g{x,9) for t in Cg} . 

It is the smallest class of functions that contains Lp' {dQ*g / dP^) and which does not presume any 
knowledge on Q*g. We thus obtain 

4>{Mg,Pa) = sup / m{x,9,t) dPo{x), 
teCe J 

where m((^, t) is the function defined on X by 

xeX h-^ m{x,e,t) ■.= to- yy* {t'^gix.e)) ^ 

to - {t^g{x, 9)) {t'^gix, 9)) + ^ (^^^ {t^g{x, 9))) . 

With the above notation, we state 

(2.12) <j>{Me,Po)^supPom{9,t). 

teCe 

So, a natural estimate of 0(A^9, Pq) is 

(2.13) sup Pnm{9,t) 

which coincides with the estimate defined in (|1.20[) . Hence, we can write 

(2.14) ^{Me,PQ)^ sup P,MS,t). 

which transforms the constrained optimization in p.20p into the above unconstrained one. 

On the other hand, the sup in (|2.12p is reached at to = cq, . . . , — ci which are solutions of the 
system of equations ()2.1ip . i.e., 

(2.15) Ce = arg sup Po"^(^, i)- 

teCe 

So, a natural estimate of eg in (|2.15p is therefore defined through 

(2.16) arg sup Pnm{9,t). 

This coincides with cg, the solution of the system of equations (|2.6p . So, we can write 

(2.17) Ce = arg sup Pnm{9,t). 
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Using (|2.14p . we obtain the following representations for the estimates (j){M,Po) in (ll.21|l and 
in (fr22ll 

(2.18) (^(Al,Po) = inf sup Pnm{9,t) 



and 



(2.19) 6*0 = arginf sup P„m(6',t), 

respectively. 



Formula p.l2p also has the following basic interest : Consider the function 

(2.20) t eCe ^ Pom{9,t), 

In order for integral p.20p to be properly defined, we assume that 

(2.21) J \gi{x,9)\ dPo{x)<oo, for all 
The domain of the function (|2.20p is 

(2.22) '^4>{9) ■■= {t e Ce such that Pom{9,t) > ~oo} . 

The function t i-^ PQm{9,t) is strictly concave on the convex set ^^{ff). Whenever it has a 
maximum t* , then it is unique, and if it belongs to the interior of 'D,f,(9), then it satisfies the first 
order condition. Therefore t* satisfies system ()2.1ip . In turn, this implies that the measure Q* 

defined through dQ* := (p'~^ (^ t*^g{ 9)^ dPo is the projection of Pq on il, by Theorem 3.4 part 

1 in Broniatowski and Keziou ( 20061 ). This implies that Q* and Pq share the same support. We 



summarize the above arguments as follows 

Proposition 2.2. Assume that \2.21]) holds and that 

(i) there exists some s in the interior of'Drf,{9) such that for all t in d'D(f,{9), the frontier of 
V^{9), it holds Pom{9,t) < Pom{9,s); 

(ii) for all t in the interior of'D^{9), there exists a neighborhood V{t) oft, such that the classes 

of functions ^x -^m{x,9,r)^ r € V^(^)| '^'"^ dominated (P^-a.s.) by some Po-integrable 
function x H(x, 9). 
Then Pq admits an unique projection Qg on Mg having the same support as Pq and 

(2.23) dQ; = cp'-^ {ce'^g{9)) dPo, 

where cg is the unique solution of the system of equations i2.ll]) . 

Remark 2.2 . In the case of KL-divergence, comparing this Proposition with Theorem 3.3 in 
I CsiszdA \l97Si ). we observe that the dual formula \2.12\) provides weaker conditions on the class of 
functions {'g{9), 9 G 0} than the geometric approach. 



Remark 2.3. The result of lBorwein and Lewii 1 199A ). with some additional conditions, provides 



more practical tools for obtaining the results in Provosition \2.2[ Assume that the functions gj{9) 
belongs to the space Lp{X,Pq) with \ < p < oo and that the following "constraint qualification" 
holds 

(2.24) there exists some Qq in Aig such that : a < inf ^ < sup fj^ < b, 

dPo dPg 
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with (a, b) is the domain of the divergence function and M.g is the set of all signed measures Q 
t. Po, satisfying the linear constraints and such t hat belong to Lq {X, Pp), (1 < q < oo 



a.c. w.r. 



and 1/p + 1/q = 1). In this case, applying Corollary 2 in Borwein and Lewi^ 111 99 A ), we obtain 



(t,{Me,Po)^ sup Itp- hf*{t'^g{x,9)) dPp{x) 

fGR(l + l) L J 

(with dual attainement) . Furthermore, if for a dual optimal ce, it holds 

lim f^iMl < in! c^g{x,9) < supcjg{x,d) < lim '^^^^ for all x {Pp a.s.), 

yl-oo y X J, kT + oo y 

then the unique projection Qg of Pp on Mg is given by 

(2.25) dQl = if*' {ce^g{9)) dPo- 

Note that if (p* is strictly convex, then cg is unique and 



sup 

t6E<' + 



jio - J^* {t^Vix, 6)) dPp{x)^ = sup |<o - Jf* {t^gix, 9)) dPo(x)| , 



P>*' {ce''g{9)) ^ p'-' {ce^g{9)) . 
Leonard \200ld ) and Leonard 1 2001 A ) gives, under minimal conditions, duality theorems of min- 



imum (j)- divergences and characterization of p roject ions under linear co n strain ts, which generalize 
the results give n by Borwein and Lewi^ il991) and lBorwein an d Lewii \l99A ). These results are 



used recently bu \Bertai \200A ) and Bertau (200(i) in empirical likelihood. 



3. Asymptotic properties and Statistical Tests 

In the sequel, we assume that the conditions in Proposition l2.1l (or Remark (|2.ip ) and in Proposition 
m (or Remark (jO)) ) hold. This allows to use the representations (PH)) . ([^T^ and ([^1^ in 
order to study the asymptotic behavior of the proposed estimates (ll.20p . p.2ip and p.22p . All the 
resu lts in the present Sec tion are obtained throus h classical methods of parametric statistics; see 
e.g. van der VaartI ( 19981 ) and Sen and Singeil (fl993). We first consider the case when 9 is fixed, 



and we study the asymptotic behavior of the estimate 4>{Mg,Pp) (see (jl.20p ) of (l){Mg,Pp) := 
inlQt^Me 4>{QtPo) both when Pp £ AAg and when Pp ^ M.g. This is done in the first Subsection. 
In the second Subsection, we study the asymptotic behavior of the EM(/)D estimates 0^ and the 
estimates {Ai, Pp) both in the two cases when Pp belongs to Ai and when Pp does not belong to 
M.. The solution of Problem 1 is given in Subsection 3.3 while Problem 2 is treated in Subsections 
3.1, 3.2, 3.3 and 3.4. 

3.1. Asymptotic properties of the estimates for a given 9 ^ Q. First we state consistency. 

Consistency. We state both weak and strong consistency of the estimates cg and 0(A^e, Pp) using 
their representations (|2.17p and (|2.14p . respectively. Denote ||.|| the Euclidian norm defined on 
or on M'+^. In order to state consistency, we need to define 

Tg := {t e Cg such that Ppm{9, t) > -oo} , 
and denote Tg the complementary of the set Tg in the set Cg, namely 

T§ := {t e Cg such that Ppm{9, t) = -oo} . 
Note that, by Proposition l2.2i the set Tg contains cg. 



We will consider the following condition 
(C.l) supjgj^^ \Pnm{9,t) — Ppm{9,t)\ converges to a.s. (resp. in probability); 
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(C.2) there exists M < and uq > 0, such that, for ah n > no, it holds supjgj^c Pnm{9,t) < 
M a.s. (resp. in probabihty). 

The condition (C.2) makes sense, since for all t G Tg we have PQm{9,t) = — oo. 

Since the function t (z Tg i-^ Pom{9, t) is strictly concave, the maximum eg is isolated, that is 

(3.1) for any positive e, we have sup Pom(9,t) < Pom(9,Cd). 

{teCg : \\t-cg\\>e} 

Proposition 3.1. Assume that conditions (C.l) and (C.2) hold. Then 

(i) the estimates (j){A4g, Pq) converge to (l){A4g, Pq) a.s. (resp. in probability). 

(ii) the estimates cg converge to cg a.s. (resp. in probability). 

Asymptotic distributions. Denote m'{9,t) the (/+l)-dimensional vector with entries -^m{9,t), 
■m"{9,t) the {I + I) x (l + l)-matrix with entries g^m{9,t), 0; := (0,...,0)^ € R', 

Orf := (0, ...,0)^ e R'^, c the {I + l)-vector defined by c := (0,0f)^, and Pog{9)g{9f 
the I X Z— matrix defined by 

Pogi9)gi9f [Po5.(%. (^)].,,=i,....r 
We will consider the following assumptions 
(A.l) cg converges in probability to cg; 

(A. 2) the function ti-^m{x,9,t) is on a neighborhood V{cg) of eg for all a: (Pp-a.s.), and 
all partial derivatives of order 3 of the function {t m{x, 9,t), t G V{cg)} are dominated 
by some Po-integrable function x H{x); 

(A. 3) Pq {\\m' {9 , cg)\\^) is finite, and the matrix Pom"{9,cg) exists and is invertible. 

Theorem 3.2. Assume that assumptions (A. 1-3) hold. Then 

(1) y^{cg — Cg) converges to a centered normal multivariate variable with covariance matrix 

(3.2) V - [-Pom"{9,cg)]-' [Pom'{9,ce)m'{9,cgf] [-Pom" {9, cg)]-'. 
In the special case, when Pq belongs to Mg, then cg = c and 



(3.3) V = ^"iir 



Of 



0, [Pogi9)gi9r] 

(2) If Pq belongs to M.g, then the statistics 

^^HMg,Po) 

converge in distribution to a variable with I degrees of freedom. 

(3) If Po does not belong to Aig, then 

V^(^^{Mg,Po)~HMg,Po)) 
converges to a centered normal variable with variance 

:= Pom{9,cgf - {Pom{9,cg)f . 



Remark 3.1. (a) When specialized to the modified Kullback-Leibler divergence, Theorem \3.2\ 
part (2) gives the limiting distribution of the empirical log-likelihood ratio 2nK L„i{A4g , Po) 
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which is the result in lOwen !ll99d ) Theorem 1. Part (3) gives its limiting distribution when 



Po does not belong to Mg. 
(b) Nonparametric confidence regions {CR,f,) forOo of asymptotic level (1— e) can be constructed 
using the statistics 

In - 
ip"{l) 

through 

2n 



CR^ := <9 such that 



if" 



where (1 — e) is the (1 — e)-quantile of a distribution. It would b e interesting to o btain 
the div e rgenc e leading to optimal confidence regions in the sense of lNevrnan {193'\) (see 



I Takaa'i {199^} ). or the optimal divergence leading to confidence regions with small length 



(volume, area or diameter) and covering the true value 9q with large enough probability. 

3.2. Asymptotic properties of the estimates and (I){A4,Pq). First we state consistency. 

Consistency. We assume that when Pq does not belong to the model Ai, the minimum, say 9* , 
of the function G O i-^- infgg^j, ^o) exists and is unique. Hence Pq admits a projection on 
A4 which we denote Qg, . Obviously when Pq belongs to the model A4, then 9* — 9q and Qg, — Pq. 
We will consider the following conditions 

(C.3) sup^ggQ jgj^g j \Pnm{9,t) — PQm{9,t)\ tends to a.s. (resp. in probability); 
(C.4) there exists a neighborhood V{cg*) of cg* such that 

(a) for any positive e, there exists some positive r] such that for all t e V{cgi-) and all 
9 ee satisfying \\9 - 9*\\ > e, it holds PQm{9*,t) < Pom{9,t) - r?; 

(b) there exists some function H such that for all t in V{cgt), we have \m(t,9Q)\ < 
H{x) (Po-a.s.) with PqH < oo; 

(C.5) there exits M < and uq > such that for all n > uq, we have 

(3.4) sup sup Pnm{9,t) < M a.s. (resp. in probability). 

dee teT^ 

Proposition 3.3. Assume that conditions (C.3-5) hold. Then 

(i) the estimates (f)(M, Pq) converge to (f){M, Pq) a.s. (resp. in probability). 

(ii) supggQ ||c6( — Cell converge to a.s. (resp. in probability). 

(iii) The ME(j)D estimates 9^ converge to 9* a.s. (resp. in probability). 

Asymptotic distributions. When Pq E A4, then by assumption, there exists unique 9q € Q such 
that Pq e Mgg. Hence 9* — 9q and cg* — cg^ — c. We state the limit distributions of the estimates 
9^ and cg^ when Pq £ Ai and when Pq ^ Ai. We will make use of the following assumptions 

(A. 4) Both estimates 9^ and converge in probability respectively to 9* and cg* ; 
(A.S) the function {9,t) i-^ m{x,9,t) is on some neighborhood V{9*,cg*) for all x {PQ-a..s.), 
and the partial derivatives of order 3 of the functions 

{{9, t) 1-^ m{x, 9, t), {9, t) £ V{9* ,cg*)} are dominated by some Po^integrable function H{x); 
(A. 6) Pq (||^'^(^*j C6i.)||^^ and Pq |^||^m(6'*,ce.)||^^ are finite, and the matrix 



S := 



Sll Si2 

S21 S22 



with Sll PQ-§pm{9*,cg.), S12 = 5*21^ := PQ-^m{9* ,cg,) and ^22 Po^m{9* ,cg,), 
exists and is invertible. 
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Theorem 3.4. Let Pq belongs to M and assumptions (A. 4-6) hold. Then, both ^fn {q^ — and 
^cg^ ~ converge in distribution to a centered multivariate normal variable with covariance 



matrix, respectively 



(3.5) 

and 



U 



V 



[Po (.9i0o)9{eof)] ' 







Q 



a [Po9{0o)9{0o) 

T 

V 



Po§-^9iOo) 







Q 



d 



Od,PoQ^9iOo) 



d 



0„Po^5(^o) 



a [Po9ido)9iOoV 

of 

01 [Po9{0o)9{dorr' . 
and the estimates 9^ and c^ are asymptotically uncorrelated. 



Remark 3.2. When specialized to the modified Kullback-Leibler divergence, the estimate Okl^ 
is the empirical likelihood estimate (ELE) (noted 6 in Oin and Lawles^ 1 199 A )), and the above 
resu lt gives the limiting distr ibution of ^/n{6KL,„ ~ ^o) which coincides with the result in Theorem 
1 in \Oin and LawlesJi \l99A ). Note also that all ME(f>DE's including ELE have the same limiting 
distribution with the same variance when Pq belongs to M. . Hence they are all equally first order 
efficient. 

Theorem 3.5. Assume that Pq does not belong to Ai and that assumptions (A. 4-6) hold. Then 



converges in distribution to a centered multivariate normal variable with covariance matrix 



where 

M := Po 

0* and cg* are characterized by 



im{e*,ce.) 

d 



l-,m{9*,ce,) 



m [9* ,C0-*) 



arg ini (j){Me,Pa) ■, 



dQl. = ' (cf.5(0)) dPo and Q^, G Me-. 

3.3. Tests of model. In order to test the hypothesis Tio : belongs to Al agamst the ahernative 
Til : Po does not belong to M, we can use the estimates (j){M.,Po) of (j){M.,Po), the (/)— divergences 
between the model M and the distribution Pq. Since (j){M,Po) is nonnegative and take value 
only when Pq belongs to M (provided that Pq admits a projection on A^), we reject the hypothesis 
7io when the estimates take large values. In the following Corollary, we give the asymptotic law 
of the estimates 4'{M, Pq) both under Tio and under TLi. 

Corollary 3.6. 

(i) Assume that the assumptions of Theorem \S.4\ hold and that I > d. Then, under T-Lq, the 
statistics 



2n 



HM,Po) 



converge in distribution to a variable with [l — d) degrees of freedom. 
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(ii) Assume that the assumptions of Theorem \3.5\ hold. Then, under Tii, we have : 

(3.6) V^(0(M,Po)-0(X,Po)) 
converges to centered normal variable with variance 

= Pnm{9* ,cg,f~ {Pom{9* , c^. ))' 

where 9* and cg' satisfy 

9* arg inf (j){Me,Po) , 



Remark 3.3. This Theorem allows to perform tests of model of asymptotic level a; the critical 
regions are 

(3.7) {^^(-^-^o) > g(i-a) 

where q(^i-a) is the (l — a) — quantile of the distribution with (l — d) degrees of freedom. Also these 

tests are all asymptotically powerful, since the estimates (f){A4,Po) are n— consistent estimates of 
4>{A4,Po) = under Tio and consistent estimates of 4'{Ai , Po) under Tii. 

We assume now that the p.m. Pq belongs to M.. We will perform simple and composite tests on 
the parameter taking into account of the information Pq G A^. 

3.4. Simple tests on the parameter. Let 

(3.8) Ho : 6*0 = 9i versus Hi : 6*0 G 6 \ {9i}, 

where 9i is a given known value. We can use the following statistics to perform tests pertaining 
to 

St:=$iMe,,Po)- m{^{Me,Po). 



Since 



0(X9,,Po) - inf <PiM0,Po) = (l){Me„Po) 



are nonnegative and take value only when 9o = 9i, we reject the hypothesis Hq when the statis- 
tics Sf^ take large values. 

We give the limit distributions of the statistics S*^ in the following Corollary which we can prove 
using some algebra and arguments used in the proof of Theorem 13.41 and Theorem 13.51 



Corollary 3.7. 

(i) Assume that assumptions of Theorem \3.4\ hold. Then under T-Lq, the statistics 



converge in distribution to variable with d degrees of freedom. 
(ii) Assume that assumptions of Theorem \3.4\ hold. Then under TLi, 

V^{St-cl,{Mg„Po)) 

converges to a centered normal variable with variance 

^2 = Pomi9i,ce, f - {Pom{9i,cg,)f . 
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Remark 3.4. When speci alized to the KLm-diver qence, the statistic 2nS^^'" is the empirical 
likelihood ratio statistic (see Qin and LawlesA 1(1994 ^ Theorem 2). 

3.5. Composite tests on the parameter. Let 

(3.9) h : R'^ R'' 

be some function such that the (d x fc)— matrix H{6) := -^h{6) exists, is continuous and has rank 
k with < k < d. Let us define the composite nuU hypothesis 

(3.f0) Go -.= {6 ee such that h{9) ^ 0} . 

We consider the composite test 

(3.ff) T-io : 6*0 e Go versus Hi : 6'oee\eo, 

i.e., the test 

(3.f2) T-Co : Poe [j Me versus TLi : Pq e [j Me. 

This test is equivalent to the foUowing one 

(3.f3) no : eoe /(Bo) versus Hi : 60 ^ /(Bq), 

where / : M'^'*^'^' —> M'* is a function such that the matrix G{0) :— ■^g{[3) exists and has 
rank (d - fc), and Bq :== {P e M^'^-'^') such that /(/3) G 60}. Therefore 6^0 € 60 is an equivalent 
statement for 6*0 = f{(3o),Po G -Bq. 

The following statistics are used to perform tests pertaining to (|3.13p : 



Since 



:= inf (A^/(^), Po) - mf ^ (A^e, Po) • 

p t -DO y fc f 



inf (^(7W/(^),Po)- inf (^(7W9,Po) = inf 0(7W/(0),Po) 



are nonnegative and take value only when Tig holds, we reject the hypothesis Tio when the sta- 
tistics take large values. 

We give the limit distributions of the statistics in the following Corollary. 
Corollary 3.8. 

(i) Assume that assumptions of Theorem \3.4\ hold. Under T-Lq, the statistics converge in 
distribution to a variable with (d — fc) degrees of freedom. 

(ii) Assume that there exists (3* e B^, such that (3* = arginf^gs,, (Alj(^), Po) . If the as- 
sumptions of Theorem \3.5\ hold for 0* = f(l3*), then 

V^(T,f-0(Me*,Po)) 
converges to a centered normal variable with variance 

= Pom{9* , eg, )^ ~ {Pom{B* , ce- ))' ■ 



18 



MICHEL BRONIATOWSKI* AND AMOR KEZIOU** 



4. Estimates of the distribution function through projected distributions 

In this Subsection, the measurable space {X,B) is (M,Sm). For all </)— divergence, by (|1.2ip . we 
have 



Proposition [2]4] above provides the description of Q~ . 

So, for all 0-divergence, we estimate the distribution function F using Q~ the 0— projection of P„ 
on A4, through 

n 

Fn{x) := ^Q|^(^.)1(-oo,x](^.) 

2—1 

(4.1) = -^^'(c7^^g(X„^^))l(_oo,x](^»). 



n 

1=1 



Remark 4.1. When the estimating equation 

n 

(4.2) -^g(X„0) = O, 



n 
i=i 

admits a solution On, then P„ belongs to M. If the solution is unique then 9^ — On . Hence by 
Proposition \2.1\ 

for all i G {1, 2, . . . , 71} , we have Q* (Xi) — — , 

S't> n 

and Fn{x), in this case, is the empirical cumulative distribution function, i.e.. 



1 " 

n ^ — ^ ^ ^ 



n 
i=i 

So, the main interest is in the case where |-^.^ does not admit a solution, that is in general when 
I > d. 

Remark 4.2. The (ji-projections Q~ of Pn on A4 may be signed measures. For all cj)- divergence 
satisfying — M!j_, the (p-projection Q~ is a p.m. if it exists, (for example, KLm, KL, Hellinger, 
and divergences all provide p.m.'s). 

We give the limit law of the estimates Fn of the distribution function F in the following Theorem. 
We will see that the estimate Fn{x) is generally more efHcient than the empirical cumulative 
distribution function Fn{x). 



Theorem 4.1. Under the assumptions of Theorem \3.4\ ^/n (^Fnix) — F{x)j converges in 
bution to a centered normal variable with variance 

(4.3) W{x) = F{x) (1 - F{x)) ~ [Po (.g(0o)l(-oo,x])] [Po (.g(0o)l(-oo,x])] , 

with 



distri- 



r = [Pog{do)g{9of] ' - [Pog{Oo)g{eof] ' 
[Po9{eo)9iOofr\ 



V X 



Po^.g(^o) 
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V = 



Po|.g(^o) 



5. Empirical likelihood and related methods 

In the present setting, the empirical likelihood (EL) approach for the estimation of the parameter 
^0 can be summarized as follows. For any 9 in Q, define the profile likelihood ratio of the sample 
X := (Xi,...,X„) through 

{71 n n '\ 

Y[nQ{X,) where Q{X,) > 0, ^Q(X,) = 1, ^ g(X„ 0)g(X,) = I . 
1=1 i=l i=l ) 

The estimate of 9o through empirical likelihood (EL) approach is then defined by 



(5.1) 



9el ■= arg sup L„(6'). 

eee 



The paper by iQin and Lawless! ( 1994 ) introduces 9el and presents its properties. In this Sec- 
tion, we show that 9el belongs to the family of ME0D estimates for the specific choice f{x) = 
— \ogx + a: — 1. We also discuss the problem of the existence of the solution of (|5.1|) for all n. 



When (p{x) = — logx+x — 1, formula p.22p clearly coincides with 9el- For test of hypotheses given 
by Ho : Pq & Me against Hi : Pq ^ Mg ot for construction of nonparametric confidence regions 
fo r ^0, the stat i stic 2riKLm( Me, Po) coincides with the empirical log-likehhood ratio introduced 



Owen ( 1988 ). Owen ( 199o[) and Oin and Lawless! (1994). We state the results of Section 3 in the 



present context. We will see that the approach of empirical likelihood by divergence minimization, 
using the dual representation of the iiTLm-divergence and the explicit form of the fTLm-projection 
of Pq, yields to the limit distrib ution of the statis tic 2nKLm(-Mg, Pp) un der Tii, which can not be 
achieved using the approach in I Owen ( 1990! ) and Oin and Lawless! (1994). Consider 



where 
(5.2) 



9kl„, = arg inf KL^{Me,Po) 



KL,n{Me,Po) = sup P„to(6I, t) 

teCe 



with ip{x) = ipo{x) 
X I— > m{x, 9, t) 
(5.3) 



log a; + a; — 1. The explicit form of m(0, t) in this case is 



- l-tT-gix^9) + - * + l^tT-gix.9) 

to + log (1 - t^g{x, 9)) . 
For fixed 9 E Q, the sup in (|5.2p . which we have noted c~e, satisfies the following system 



- 1. 



(5.4) 



It 



dPnix) - 1 

dPn{x) = 0, for all j = 1, 



a system of {I + 1) equations and {I + 1) variables. The projection Qg is then obtained using 
Proposition 1 2 . 1 1 part (ii). We have for alH G {1, . . . , n} 
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which, muhiplying by Qt(X i) and summing; upon t yields cq — 0. Therefore the system (|5.4p 
reduces to the system (3.3) in I Qin and Lawlesi (1993) replacing ci, . . . , c; by — ti, . . . , —ti. Simphfy 
(15.31) plugging tg — 0. Notice that 2nK L^iMe , Pq) = Isif^a) in the notation of Qin and LawlessI 
(|1994 ). and that the function of i = (0, — ti, . . . , —ti) defined by 



coincide with the function 



t ^ Pnm{e,t) 



T^P„log(l + r^5(.,0)) 



used in I Qin and LawlessI (|l994h . The interest in formula ()5.2p lays in the obtention of the limit 
distributions of 2nKL,n{.Me, Pa) under Hi- By Theorem 13.21 we have 

(j<lZ{Me, Po) - KLrniMe: Po)) 

converges to a normal distribution variable, wh ich proves consistency of the test; this results can- 
not be obtained by the Qin and LawlessI ( 19941 ) 's approach. 



The choice of ip depends on some a priori knowledge on Oq. Hopefully, some divergences do not 



have such an inconvenient. We now clarify this point. For fixed in O, let A^i"'' and I?^"'' be 



defined respectively as in (|1.19p and in 



Assume that M^J"^ n P^"'' is not void. Then P„ 
, Pn) is finite. The estimation of is achieved minimizing 



(«) 







has a projection on Mg^^ and 
(j){MB,Po) on the sets 

et := {6* e e such that n pj'^ is not void} . 

Clearly the description of 8^ depends on the divergence (/). Consider the following example, with 
n = 2, X = {Xi,X2) and g(x, 6*) = x - 0. Then 

Me = {iqi,q2f such that qi + q2 = I and qi{Xi -9)+ 92(^2 - 6*) = 0} 

and 



^ = |('?i, 92) such that i ^ ip{2q,) < ooj 



-,(2) 



(n) (n) 

So, according to the value of 6, Mg n 'D)^^ 



When (j) = KL„i, then J^k'l^ = x 
may be void and therefore Q^^"' has a complex structure. At the opposite, for example when 
= X^ then V^^^ = R^. Hence nP^"^ = M^"^ which is not void for all 9 and hence 6^' = 9. 



Qn the other hand, we have for any ^-divergence 

6 J, :— arg inf inf 



arg inf inf 



When P^"-* ^ R", t he infimum in 9 above should be taken upon 8^ which might be quite cumber- 



Owenl ( 2001 ) indeed mentions such a difficulty. 



In relation to this problem, Qin and LawlessI (1994) bring some asymptotic arguments in the case 
of the empirical likelihood. They show that there exists a sequence of neighborhoods 



V, 



n{Oa) ■■= \^9 such that \\9 - 9q\\ < n^^/^j 
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on which, with probabihty one as n tends to infinity, Ln{0) has a maximum. This turns out, in 
the context of (/)-divergences, to write that the mapping 

9^ inf KL^{Q,P„) 

has a minimum when belongs to VniOo)- This interesting result does not solve the problem for 
fixed n, as is unknown. For such problem, the use of (/)-divergences, satisfying 2?^"'' = R" (for 
example x^-divergence), might give information about 9q and localizes it through 0-divergence 
confidence regions (CR^^s). 

The choice of the divergence (j) also depends upon some knowledge on the support of the unknown 
p.m. Pp. When Pq has a projection on M with same support as Pq, Proposition 12.21 vields its 
description and its explicit calculation. A necessary condition for this is that Cg, as defined in 
(|2.9[) . has non void interior in M('+^^ Consider the case of the empirical likelihood, that is when 
ip{x) — — logx + a: — 1; then Im ip' =] — oo, 1[. Consider g(x, 9) — x ~ 0, i.e., a constraint on the 
mean. Assume that the support of Po is unbounded. Then 

Co ^ {t e such that for aU x (Pq - a.s.) , to + ti{x - 0) e] - oo, 1[} . 

Therefore, ti = and Ce =] — oo, l[x{0} which implies that the interior of Cg is void. This results 
indicates that the support of Q* is not the same as the support of Pq. Hence in this case we cannot 
use the dual representation of KLm{Mg, Po)- The arguments used in Section 3 for the obtention 
of limiting distributions cannot be used, if the support of Pq is unbounded, in order to obtain the 
limiting distribution of the estimates KLm{.Mg, Po) under Hi (i.e., when Pq does not belong to 
Aig). We thus cannot conclude in this case that the tests pertaining to 9o are consistent. 

6. Robustness and Efficiency of ME^D estimates and Simulation Results 



Lindsavl () 19941 ) introduced a general instrument for the study of the asymptotic properties of 



parametric estimates by minimum </)-divergences, called Residual Adjustment Function (RAF). 
We first recall its definition. Let {Pg ; 6 G 8} be some parametric model defined on a finite set 
X. Let Xi, . . . , Xn a sample with distribution Pg^. A minimum 0-divergence estimate (M(/)DE) 
(called also minimum disparity estimator) of 9o is given by 

(6.1) ^.:=-ginf|:^(^)p„(x), 

where Pnix) is the proportion of the sample point that take value x. When the parametric model 
{Pg : E 8} is regular, then 9^ is solution of the equation 



which can be written as 

(6.3) Y.M^{^))Pe{x)=Q- 

xex 

In this display, A^{u) :— ip' (^:;7q7r) depends only upon the divergence function ip and 

Pe{x) 

is the "Pearson Residual" at x which belongs to ] — 1, +oo[. The function A^{.) is the RAF. 
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The points x for which S{x) is close to —1 are called "inliers", whereas points x such that S{x) 
is large are called "ou tliers" . Efficien cy properties are li nked with the beha.vior of A^{.) in the 
neighborhood of (see iLindsav (1994) Proposition 3 and Basu and Lindsay ( 1994 )) : the smaller 
the value of I ^"(0)1, the more second efficient the estimate 9(j, in the sense of Raol ( 1961 ). 



It is easy to verify that the RAF's of the power divergences i 
in (|1.4p . have the form 



'7; 



defined by the divergence functions 



(6.4) 



A,{5) 



((5 + 1 



1 



(7-1) 



In particular, the M(/)^DE of (16. 2|) with the RAF in (16. 4|) corresponds to the maximum likelihood 
when 7 = 0, minimum Hellinger distance when 7 ~ 0.5, minimum divergence when j — 2, 
minimum modified divergence when 7 = — 1 and minimum KL divergence when 7 = 1. 



From (|6.4p . we see that A"(0) = 7. Hence for the maximum likelihood estimate, we have 
|A"(0)| = |^o(0)| = w hich i s the smallest value of |A"(0)|, 7 G R. Therefore, according to 
Proposition 3 in Lindsavl ( 1994 ), the max imum likelihood estimate is the most second-order effi- 
cient estimate (in the sense of iRad ((19611)) among all minimum power divergences estimates. 



Robustness features of 9^ against inliers and outliers are related to the variations of A^(u) or f{x) 
when w or X close to —1 and -foo, respectively as seen through the following heuristic arguments. 
Let 01 and 02 two divergences associated to the functions tpi and ip2. If 



xTq ip2 {x) 



lim 



+00, 



then the estimating equation (16. 2|) corresponding to Lpi in not as stable as that corresponding to 
(/52, and hence the ME02DE is more robust than ME(/)iDE against outliers. If 



x] + co ip2{x) 



lim 



-foo. 



then the estimating equation (|6.2p corresponding to is not as stable as that corresponding to 
If 2, and hence the ME(/)2DE is more robust than ME^iDE against inliers. 



In all cases, the divergence associated to the divergence function having the smallest variations on 
its domain leads to the most robust estimate against both outliers and inliers. 



It is shown also in Jimenez and Shad (|200lf) that no minimum power divergence estimate (includ- 
ing the maximum likelihood one) is better than the minimum Hellinger divergence in terms of both 
second-order efficiency and robustness. 



In the examples below, we compare by simulations the efficiency and robustness properties of some 
ME^DE's for some models satisfying linear constraints. We will see that the minimum empirical 
Hellinger divergence estimate represents a suitable compromise between efficiency and robustness. 
A theoretical study of efficiency and robustness properties of ME^DE's is necessary and should 
envolve second-order efficiency versus robustness since all ME0DE's are all equally first-order 
efficient (see Remark 13.21 and Theorem I3.4[) . 



Numerical Results. We consider for illustration the same model as in lQin and LawlessI (|l994h 
Section 5 Example 1. The model Aig fsec 11.12]) here is the set of all signed finite measures Q 
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satisfying 

(6.5) y"dQ = 1 and J g{x,6) dQ{x) = 0, 

with 0) — ((x — 9), {x^ — — X))^ and 6*, the parameter of interest, belongs to M. 

In Examples l.a and l.b below, we compare the efficiency property of various estimates : we gener- 
ate 1000 pseudorandom samples of sizes 25, 50, 75 and 100 from a normal distribution with mean 
6*0 and variance 6'q + 1 (i.e., Pq = M{Bq,B\ + 1)) for two values of 6*0 : = in Example l.a and 
6*0 = 1 in Example l.b. Note that Pq satisfies (|6.5p . 



For each sample, we consider various estimates of 6*0 : the sample mean estimate (SME), the para- 
metric ML estimate (MLE) based on the normal distribution M{9,9'^ + 1) and ME0D estimates 
9^ associated to the divergences : 4> — Xm; KL, and A'L„j-divergence (which coincides with 
the MEL one, i.e., MEi^L™E=MELE). 

For all divergence 4> considered, in order to calculate the ME(/)DE 9^, we first calculate (j){A4g, Pq) 
for all given 9 (using the representation (|2.14p ) by Newton's method, and then minimize it to 
obtain 9j,. 



The results of Theorem 13.41 show that for all 0-divergence 

[d^ - Oo) ^ AA(0, V) 

where V is independent of the d ivergence (p; it is given in Theorem 13.41 For the present model, 
following iQin and Lawless! (|l994l ). V writes 

(6.6) V = Var{X) - A^^ [m' {9a)Var{X) + 9om{9a) - E{X^)] )^ 

where A = E [m'{9o)iX - 9o) + m{9o) - X^]^ and m{9) 26*2 + 1. Thus V < Var{X) which is 
the variance of y/n (X„ — ^o) with X„ i X]"=i -^i^ the sample mean estimate (SME) of 9^. So, 
EM(/)D estimates are all asymptotically at least as efficient as Xn- 

6.1. Example l.a. In this example the true value of the parameter is = 0. 



n 


MEx^„DE 
mean var 


MEifL,„DE=MELE 
mean var 


MEiJDE 
mean var 


MEKLDE 
mean var 


25 
50 
75 
100 


0.0089 0.0314 
-0.0116 0.0209 
-0.0025 0.0171 
-0.0172 0.0112 


0.0086 0.0315 
-0.0118 0.0210 
-0.0024 0.0170 
-0.0174 0.0111 


0.0084 0.0315 
-0.0119 0.0210 
-0.0023 0.0170 
-0.0174 0.0111 


0.0082 0.0314 
-0.0120 0.0210 
-0.0022 0.0169 
-0.0175 0.0112 


n 


MEx^DE 
mean var 


PMLE 
mean var 


SME 
mean var 




25 
50 
75 
100 


0.0077 0.0313 
-0.0125 0.0212 
-0.0019 0.0167 
-0.0177 0.0112 


0.0026 0.0318 
-0.0063 0.0196 
-0.0011 0.0170 
-0.0158 0.0108 


0.0081 0.0394 
-0.0040 0.0200 
0.0013 0.0164 
-0.0149 0.0102 





Table 1. Estimated mean and variance of the estimates of 9q in Example l.a. 
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We can see from Table [T] that all the estimates converge in a satisfactory way. The estimated 
variances are almost the same for all estimates. This is not surprising since the limit variance of 
all estimates in this Example (when 9q = 0) is close to V{X). 

6.2. Example l.b. In this example the true value of the parameter is 6q = 1. 







4DE 


MEXi,„DE=MELE 


MEHDE 


MEXiDE 


n 


mean 


var 


mean 


var 


mean var 


mean 


var 


25 
50 
75 
100 


0.9394 
0.9994 
1.0009 
0.9984 


0.0310 
0.0186 
0.0156 
0.0113 


0.9387 
0.9967 
0.9988 
0.9959 


0.0312 
0.0186 
0.0154 
0.0112 


0.9385 0.0313 
0.9954 0.0186 
0.9975 0.0154 
0.9945 0.0112 


0.9378 
0.9941 
0.9966 
0.99315 


0.0316 
0.0187 
0.0153 
0.0112 




MEx^DE 




PMLE 


SME 






n 


mean 


var 


mean 


var 


mean var 






25 
50 
75 
100 


0.9350 
0.9909 
0.9940 
0.9900 


0.0322 
0.0190 
0.0152 
0.0113 


0.9540 
1.0036 
1.0003 
0.9970 


0.0325 
0.0174 
0.0149 
0.0107 


1.0033 0.0810 
1.0021 0.0407 
0.9912 0.0288 
0.9851 0.0262 





Table 2. Estimated mean and variance of the estimates of 9o in Example l.b. 



We can see from Table [2] and Figure [T] that the estimated bias of E^DE's are all smaller than 
the SME one for moderate and large sample sizes. Furthermore, from Figure [21 we observe that 
the estimated variances of E^DE's are all less than the SME one. They lie between that of the 
sample mean and that of the parametric maximum likelihood estimate. We observe also that the 
estimated variances of the MELE and MEHDE are equal and are the smallest among the variances 
of all ME(/)DE's considered. It should be emphasized that even for small sample sizes, the MSE of 
the SM is larger than any of ME(/)DE's. 

In Examples 2. a and 2.b below, we compare robustness property of the estimates considered above 
for contaminated data : we consider the same model Aie as in (16. 5p . 

6.3. Example 2. a. In this Example, we generate 1000 pseudo-random samples of sizes 25, 50, 75 
and 100 from a distribution 

= (1 - e)Po + e65 

where Pq = A/'(6'o, 0q + 1), e = 0.15 and 9q = 2. We consider the same estimates as in the above 
examples. 

In this Example, we can see from Tableland Figure[n]that the MEx^D estimate is the most robust 
and MEXto estimate is the least robust. We observe also that the MELE which is the ME/CimDE 
is less robust than the MEA'iDE and that the MEHD estimate is more robust than MEL one. 

6.4. Example 2.b. In this Example, we generate 1000 pseudo-random samples of sizes 50, 100, 
150 and 200 from a distribution Pq = JV{Oa, 6q + 1) with 6*0 = 2 and we cancel the observations in 
the interval [4, 5] . We consider the same estimates as in the above examples. 

In this example, in contrast with Example 2.b, we observe that the MEx^DE is the most robust, 
MEx^DE is the least robust and MEA'LDE is less robust than MEATi^DE (=MELE). Generally, 



LINEAR CONSTRAINTS WITH UNKNOWN PARAMETERS 



25 



O 

p _ 



0) 




40 60 80 100 

Samples sizes 



Figure 1. Estimated mean of the estimates of Oq in Example l.b. 
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Figure 2. Estimated variance of the estimates of in Example l.b. 
if a ME(/)DE is more robust than its adjoinlQ (i.e., ME^'^DE) against "outhers", then it is less 



For all divergence associated to a convex function ip, its adjoint, noted (p"" , is the divergence associated to 
the convex function, noted ip^ , defined by : ip^{x) = Xip(l/x), for all x. 
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MEx 


™DE 


MEA'L„,DE==MELE 




MEXiDE 




mean 


var 


11108,11 






rnCcLii 


var 


25 


2.1609 


0.0654 


2.1513 


0.0653 


2.1453 0.0653 


2.1396 


0.0652 


50 


2.2087 


0.0303 


2.1975 


0.0304 


2.1912 0.0307 


2.1848 


0.0309 


75 


2.2218 


0.0214 


2.2106 


0.0213 


2.2046 0.0213 


2.1987 


0.0215 


100 


2.2283 


0.0151 


2.2169 


0.0149 


2.2110 0.0148 


2.2052 


0.0149 




MEx^DE 




PMLE 


SME 






n 


mean 


var 


mean 


var 


mean var 






25 


2.1278 


0.0646 


2.2088 


0.0581 


2.4265 0.2178 






50 


2.1729 


0.0316 


2.2296 


0.0280 


2.4535 0.1076 






75 


2.1877 


0.0219 


2.2337 


0.0197 


2.4545 0.0721 






100 


2.1947 


0.0151 


2.2352 


0.0139 


2.4572 0.0543 







Table 3. Estimated mean and variance of the estimates of in Example 2. a. 
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Figure 3. Estimated mean of the estimates of in Example 2. a. 

robust then its adjoint against "inliers" (see Examples 2. a and 2.b). The Hellinger divergence has 
not this disadvantage since it is self-adjoint (i.e., H — H^). 

7. Proofs 

7.1. Proof of Proposition [27T1 Proof of part (i). The function 

n 

(g(Xi), . . . , Q{Xn)f eW'^-J^^ (nQ(XO) 



chi2m 

--- H 
KL 

- - chi2 
-- MEL 

-o- ML 
■-- SM 
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MEx 


™DE 


MEA'L„,DE==MELE 




MEXLDE 




mean 


var 


11108,11 






rnCcLii 


var 


50 


1.9917 


0.0451 


1.9784 


0.0431 


1.9721 0.0426 


1.9659 


0.0423 


100 


1.9962 


0.0362 


1.9844 


0.0346 


1.9787 0.0341 


1.9729 


0.0336 


150 


2.0011 


0.0150 


1.9903 


0.0142 


1.9849 0.0139 


1.9795 


0.0137 


200 


1.9602 


0.0162 


1.9516 


0.0158 


1.9473 0.0157 


1.9430 


0.0156 




MEx^DE 




PMLE 


SME 






n 


mean 


var 


mean 


var 


mean var 






50 


1.9522 


0.0428 


1.9705 


0.0358 


1.7750 0.1039 






100 


1.9590 


0.0329 


1.9687 


0.0298 


1.7365 0.0576 






150 


1.9671 


0.0135 


1.9781 


0.0121 


1.7456 0.0283 






200 


1.9325 


0.0155 


1.9420 


0.0146 


1.7247 0.0317 







Table 4. Estimated mean and variance of the estimates of 9^ in Example 2.b. 
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Figure 4. Estimated mean of the estimates of 9q in Example 2.b. 

is continuous and nonnegative on 'D'"^\ Furthermore, the set A^^"^ is closed in R". Hence, by 
condition ()2.3p . the infimum of the function 

1 " 

(g(Xi), . . . , Q{X^)f e R" - ^ ^ (nQ(XO) 

on the set T>^^^ n A^g""* exists as an interior point of T)^^^ . Since the above function is strictly 

convex and the set P^"-* n A^^"'' is convex, then this infimum is unique. It is noted Q*^. This 
concludes the proof of part (i). 

Proof of part (ii). Since (Q(Xi), . . . , Q(X„))^ e M» i ^^'^^ (nQ(X,)) is on the interior 
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of , and since Qg is in the interior of , we can use the Lagrange method. This yields the 
expHcit form p.5|) of the projection Qg in which cq is the Lagrange multipher associated to the 
constraint X]r=i Qi-^i) — 1 to the constraint Q{^i)9j{^ii ^) = Oj for a-U J = • ■ ■ j ^■ 



This concludes the proof of Proposition [2TTJ 

7.2. Proof of Proposition [37ll Define the estimates 

Co — arg inf Pnm{9,t) and (A^g, Pq) = supP„TO(0,t). 

By condition (C.2), for all n sufhciently large, we have 

co^co and ^ {Me, Po) = (f (Me, Po) ■ 

We prove that (A^e, Po) and eg converge to {Me, Po) and ce r espectively. Since ce i s isolated, 
then consistency of ce holds as a consequence of Theorem 5.7 in Ivan der Vaart (|l998l) . For the 
estimate (j) {-Me, Po), we have 

^{Me,Po)-^{Me,Po)\ = \Pnm{0,£e) - Pom{e,ce)\ := 

which implies 

Pnm{9, Ce) ~~ Pom{9, ce) < A < Pnm{6, ce) - Pom{9, ce). 
Both the RHS and the LHS terms in the above display go to 0, under condition (C.l). This implies 
that A tends to 0, which concludes the proof of Proposition 13. II 



7.3. Proof of Theorem 13.21 . Proof of part (1). Some calculus yield 



{7.l)Pom'{e,ce) 
and 

(7.2) 



Po(l-^' (c^g((?)),-gi(0)^' (c^5(0)) 



^gi{e)^' {cU{0)) 



Pom"{e,ce) = Po 



{cjgio)) 



i,j=0,...,l 



which implies that the matrix Pom" {0, ce) is symmetric. Under assumption (A. 2), by Taylor 
expansion, there exists t„ G M'+^ inside the segment that links ce and ce with 

= Pnm'{e,ce) 
(7.3) = Pnm'{e,ce) + {Pnm"{e,ce))'^ {ce-ce) 

+ \ {ce - ce)'^ Pnm"'{0,t,i) {ce - ce) , 

in which, Pnin'" {d,tn) is a (Z + 1)— vector whose entries are (/ + 1) x (/ + 1)— matrices. By (A. 2), 
we have for the sup-norm of vectors and matrices 



\Pnm"'{e,tn)\\ := 



-^m"'(X„0,i„) <-J2\H{X,)\. 

1=1 1=1 

By the Law of Large Numbers (LLN), P„m"'(6',t„) — Op{l). So using (A.l), we can write 
the last term in the right hand side of (|7.3p as op(l) {ce — ce). On the other hand by (A. 3), 
Pnm"{9,ce) := ^ X]"=i ^e) converges to the matrix Poni"{6,ce)- Write Pnm"{9,ce) as 



Pom" {6, Ce) + op(l) to obtain from (fTS)) 

(7.4) - Pnm'{e,ce) = {Pom"{0,ce) + op{l)) {^e ~ ce) . 

Under (A. 3), by the Central Limit Theorem, we have ^ynPnm' {9 , ce) — Op{l), which by (|7.4 
implies that -Jn{ce — ce) = Op{\). Hence, from (|7.4|) . we get 



(7.5) 



^/^{ce-ce) = {-P^vn!'{B,ce)\ ^ ^P^m! {B ,ce) + op{\). 
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Under (A. 3), the Central Limit Theorem concludes the proof of part 1. In the case when Pq belongs 
to A4g, then cj = {lp'{1),0^) := c and calculation yields 

qt \ , ^ / 1 or 



Pom'(.,,K(.,,r=( « and - ^"(l)Pom"(^,.) = ( p^^^f^^^^^r 



A simple calculation yields (|3.3p . 

Proof of part (2). By Taylor expansion, there exists i„ inside the segment that links cg and eg 
with 

4>„{Mg,Po) = Pnm{6,£g) 

= Pnm{e,Cg) + {Pnm'{e,Cg)f {eg - eg) 
1 ^ T ^ 

+ -j^{ci-cg) [Pnm"{6,eg)]{cg - eg) 

■ l<i.j,k<d 

(7.6) ''^^'''^^''-dm;drr^^^'-^- 

When Pq belongs to M.g, then — c. Hence Pnm{6,eg) ~ Pnm{6,c) = P„0 = 0. Furthermore, 
by part (1) in Theorem [321 it holds ^/n{ce - eg) = Op(l). Hence, by (A.l), (A. 2) and (A.3), we 

get 

MMe, Po) = {Pnm'{9, eg))^ {eg - eg) + 

-{eg - eg) [Pam"{0,eg)] {cg - cg) + op{l/n), 

which by (|7.5p . implies 

MMg,Po) = [Pnm'{9,cg)f[-Pom"i9,cg)r^[Pnm'i9,ce)] + 

^[Pnm'{9,cg)f[Pom"{e,cg)r^ [Pnm'{e,cg)]+op{l/n) 

= ^[Pnm'{e,cg)f[-Pom"{0,ee)r^ [P„m'(0, c)] + op(l/n). 

This yields to 

(7.7) ^^0„(7We,Po) = [V^Pnm'{9,cg)f[~^"{l)Pom"ie,ee)r^ [V^Pnm {9 , eg)] +op(l). 
Note that when Pq belongs to Ale, then cJ = c and calculation yields 

Po™'(^,,K(.,sf =( « pJI^o^t) and - ^"(1)P„^"(^,,) = ( PogiW)- ) ' 
Combining this with (|7.7p . we conclude the proof of part (2). 

Proof of part (3). Since {cg - cg) = Op{l/y/n) and P„to'(6', ce) = Pom' {9, cg)+op{l) = 0+op(l) = 
op(l), then, using (|7.6|) . we obtain 

V^(^^n{Me,Po)-q^{Me,Po)) = {^MMg, Po) - Pom{9,cg)'^ 

= V^{Pnm{9,eg)-Pom{9,cg)) + op{l), 
and the Central Limit Theorem yields to the conclusion of the proof of Theorem [ 
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7.4. Proof of Proposition 13.31 Define the estimates 

6*0 :— arg inf sup Pnm{9,t), 

4>{M,Po) inf supP„TO(0,t) 

and for all G 8, 

C0 := arg sup P„m(6', t). 

teTe 

By condition (C.5), for all n sufficiently large, it holds 

e^^e^ and ^{M,Po)^^{M,Po). 

We prove that 6^ and {A4, Pq) are consistent. First, we prove the consistency of (Ai, Pq). We 
have 

^{M,Po)-(f>{M,Po)\ = |F„m(6^^,c7j -FoTO(r,ceO| =: 1^1- 



This implies 

P„m 



(^^,00.) - Pom {e^,ce*') <A< P^m (r,c7j - Pom (r,c7j . 



By condition (C.3), both the RHS and LHS terms in the above display go to 0. This implies that 
A tends to which concludes the proof of part (i). 

Proof of part (ii). Since for sufficiently large n, by condition (C.5), we have cg = cg for all 6* e 8, 
the convergence of supggg \\ce — ce\\ to implies (ii). We prove now that supggg \\c9 — ce\\ tends 
to 0. By the very definition of and condition (C.3), we have 

Pnm{0,ce) > Pnm{0,ce) 

(7.8) > Pom{e,ce)-op{l), 

where op(l) does not depends upon 6 (due to condition (C.3)). Hence, we have for all 9 <E Q, 

(7.9) Pom{e,cg)- Pom{e,ce) < Pnm{9,£e) - Pom{e,£g) + op{1). 

The term in the RHS of the above display is less than 

sup \Pnm{0,t) ~ Pom{0,t)\+op{l) 
eee,teTe 

which by (C.3), tends to 0. Let e > be such that supggg \\ce — ce\\ > e. There exists some a„ € 8 
such that II — Ca„ || > e. Together with the strict concavity of the function t E Tg ^ Pom{6,t) 
for all 6* G 8, there exists rj > such that 

Pom (a„, Ca„) - Pom (a„, c^) > t]. 

We then conclude that 



P < sup ||ce - Cell > e ^ < P{Pam{an,CaJ - Po'ti (a„, Ca„) > r]} , 
and the RHS term tends to by (|7.9p . This concludes the proof part (ii) . 

Proof of part (iii). We prove that 6^ converges to 6*. By the very definition of 0^, condition 
(C.4.b) and part (ii), we obtain 



Pnm{9^,Cg^ < Pnm{9*,C0^) 

%m (r,c7j -op(l). 



< Po 
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from which 

Pom (^^0,c^j - Fo"i (6'*,cg^) < Pom(^^,c^^'j - Pnm(d^,c^^'j +op{l) 

(7.10) < sup |P„m(6l,i)-PoTO(0,t)|+op(l). 

{eee,teTo} 

Further, by part (ii) and condition (C.4.a), for any positive e, there exists 77 > such that 

The RHS term, under condition (C.3), tends to by (|7.10p . This concludes the proof of Proposition 



7.5. Proof of Theorem 13.41 Since Pq £ M, then eg = c. Some calculus yield 

rr.1 T 



^m(0o,c) = [O,-gi(0o),...,-.9;(^o)]^ 



(7.11) 



and 



^m{9, i) = - ^ t.^(t^5(0))^5, (^?), ^MOo, c) = 0„ 



89- 



89 



92 



89dt 
92 



8t86 



m{9o,c) = 
m{9o,c) 



d d 
0d,~Q^gii0Q),...,~-g^gi{9o) 



Q.|.(^) 



92 



89dt 



TO (6*0,0) 



8^ 

-^m{9Q,c) = [Orf,...,Oj, 



|,m(0o,c) = -^| 



'9^{0o)9,m,,=,.,,...,i ■■= ^ (5(^0)5(^0)^) 



Integrating w.r.t. Pq, we obtain 
(7.12) 



(7.13) 



Po^^m{9o,c) = Oi, Po^m{9o,c)^0d, Po^m(0o, c) = [0^, . . . , OJ , 



Q„Po^5(0o) 



(7.14) 
and 

(7.15) 



Po^^mi9o,c) 



Poj^m(0o,c) 



0„Po^5(^o) 



-1 

^o^™(^o,c) = ^77(Y) [^"•9»(^")-9j(^")kj=o,i,...,r 



-1 / 1 



Q 



^"(1) V 0/ Pogieo)9i0of 
By the very definition of 0^ and , they both obey 

Pn-§-,m{9,t) - 

Pn-^m{9,t{9)) = 0, 



I.e., 



= 
= 0. 
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The second term in the left hand side of the second equation is equal to 0, due to the first equation. 
Hence and 9a, are solutions of the somehow simpler system 



Use a Taylor expansion in (El); there exists (j)mtn^ 
(6*0, c) such that 



= OiEl) 
= {E2). 

inside the segment that links {9^,og) and 







(7.16) 
with 

(7.17) 

and 

(7.18) 



PnTr:m{9Q,c) + 
at 

1 r 



92 \ / 92 



aedt 



) Pn 

Pn'ggQpr'm'{9 , Cn) Pn QQ'iQt ™(^i '-n) 



By (A. 5), the LLN implies that An = Op{l). So using (A. 4), we can write the last term in right 
hand side of (|7.16[) as op(l)a„. On the other hand by (A. 6), we can write also 

{Pn^m{9Q,c)^ ,(^Pn-^m{9o,c)^^ 

from (fTTel) 







)1 


as 



Po^m(0o,c),(PoJ^ 



-op(l) to obtain 



(7.19) 



Pn^m{9Q,c) 
ot 



P,^m{9o,c) + op{l), fp,^mi9o,c) ] + op{l) 



In the same way, using a Taylor expansion in (E2), there exists {9mtn) inside the segment that 
links (j)^,c^^ and {9o,c) such that 



= Pn-^mi9o,c) 



q2 \^ / q2 



dtdO 



(7.20) +-a^S„a„, 
with 

As in (|7.19p . we obtain 



P. 



P 



■Q^j^'m(9n,tn) Pn QtQg'i ^i9n, tn) 



" aedtde 



m{9n,tn) Pn-^m{9„,tn) 



(7.21) 



d 

Pn-g^rn{9o,c) 



Po-g^m{9o,c)] + op{l),Po^m{9o,c) + op(l) 
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From (fTTOl) and (Tf^Tj) . we get 



(7.22) 



-P„-^m(6lo,c) 



Op(l). 



Denote S" the + 1 + rf) x + 1 + rf)— matrix defined by 
(7.23) S := 



We have 
(7.24) 



5*11 5*12 

5*21 5*22 



{Po^M9o,c) 
^o^m(0o,c(0o)) 



" ^"(1) V 0, Po.9(eo)g(^?o)^ 



(7.25) 



0„Po^5(^o) 



92 



<S'21 — — 



and 



(7.26) 

The inverse matrix of the matrix 5 writes 



S'22 = Po-g^m{0o,c) = [Orf, . . . ,0J 



(7.27) 
where 



'S'22.1 



5- 



— 821811 S12 

Orf,^o^.g(eo) 



"'^22^1'5'21'S' 



-1 
11 



'-'22.1 



(7.28) 



Po|5(^o) 



0, [Pogieo)g{eoV 
[Pogieo)g{eofr' 



Od,Po-g^9iOo) 



From (fr22l) . using (fr23l) and (TT^Tl) . we can write 



cs- - c 
^0 — do 



811 + S'ii^5'i2S'22^1'S'21<5'i/ ^'S'll"^5'i25'22^1 
~'^22.1'5'21*S']^]^ 'S'22.1 



(7.29) 
Note that 
(7.30) 



-P„fm(0o,c) 



Op(l). 



under assumption (A. 6), by the Central Limit Theorem, converges in distribution to a centered 
multivariate normal variable with covariance matrix 



(7.31) 



M = 



Mil M12 
M21 M22 
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where 
(7.32) 

Mil = 







Q 



/O Of 



M21 



and M; 



22 



V 0; 



0, Pogieo)gieo) 
Hence, from (I7.30p . we deduce that 

converges in distribution to a centered multivariate normal variable with covariance matrix 



(7.34) 



c = s-Hi[s-^y 



Cii C12 
C21 C22 



and using (|7.32p and some algebra, we get 



Cii 







Q 



(7.35) 

(7.36) 
and 

(7.37) 



L 0, [P,g{6,)g{e^) 

[C22 



d 



O,Po^3(0o) 



of 

Q, [Po.g(eo)g(^o)^] 



0, [^05(^0)9(^0)^] 

C12 = [0;,...,0,], C21 = [Orf,...,0,] 



C2 



Po|.(^o) 



[Po (9(00)5(^0)"^)] 



Po|.9(^o) 



From (|7.33|) . we deduce that Cn and C22 are respectively the limit covariance matrix of ^/n — 



and \/n [^^ — OqJ , i.e., U — Cn and V = €22- (|7.36p implies that -^n — and -^n ( 0, 
are asymptotically uncorrelated. This concludes the Proof of Theorem 



— 170 



7.6. Proof of Theorem 13.51 Under assumptions (A. 4-6), as in the proof of Theorem 13. 4[ we 
obtain 



/n 



and the CLT concludes the proof. 



1 ( -PnfM^\ce* 



op(l), 



7.7. Proof of Theorem 14.11 Using Taylor expansion at (c^Oq), we get 

Fnix) := ^Qri(_^,,](X,) := -^^(c^^g(X„?^))l(_^,,](X,) 



= Fn{x) + 



-\ T 



^7^(^?,-c)+op(^„), 



(7.38) 
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where (5„ := 
yields 



Co — c 
— 



'(!> — (70 



which by Tlieorem 13. 4i is equal to Op(l/^/n)- Hence, (|7.38p 



y/^[Fn{x)-F{x)'j = ^{Fn{x)^F{x)) + 

+ ^ [^0 (ff(^o)l(-oo.x])]^ V^(cT^ -c) +op(l). 

(7.39) 

On the other hand, from (|7.29p . we get 

(7.40) V^(c'^-c) = i/V^(^-P„^m(0o,c)) +op(l) 
with 

(7.41) H = + Si2S22.i^'ii^ii ■ 

We will use /(.) to denote the function l(-oo,x](-) ~ fo^' ^-U x £ R. Substituting (|7.40p in 

(I73S1), we get 



n[F„{x)^F{x)) = V^p„/+__ [Po(5(0o)l(_^,,])]- 



H X 



(7.42) 



xV^( -P^—m{eo,c) ) +op(l). 



By the Multivariate Central Limit Theorem, the vector 

c)] ] converges in distribution to a centered multivariate normal variable 
which implies that ^yn (^Fn{x) ~ F{x)j is asymptotically centered normal variable. We calculate 



1 



now its limit variance, noted W{x) 

W{x) = F{x){l - F{x)) + [Po (5(^o)l(-oo,.])]' U [Po (5(0o)l(-oo,.])] 

(7.43) +2-L_^[Po{g{0o)t(_^^,])f H Po (^-^m(0o,c)l(-co,x]) • 

Use the explicit forms of ^?7i(6'o, c), the matrices U and V and some algebra to obtain 



W{x) = F{x) (1 - F{x)) - [Po (5(eo)l(-co,x])]^r [Po (g(eo)l(-oo,x])] , 



with 



r 



[^05(^0)5(^0)^] ' - [Po9{0o)gi0of] 

[PogiOoMOofV'. 

This concludes the proof of Theorem 14.11 



1 T 



Po^5(^o) 



V X 



Po^5(^o) 
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